Feature Selection for Effective Calculation of a Similarity Measure

نویسندگان

Tomoya Ogawa

Tohgoroh Matsui

Nobuhiro Inuzuka

Hirohisa Seki

چکیده

In any mining application for useful information from databases, an increasing number of features (attributes) makes worse results and loses much time. We propose a feature selection technique which saves computation time and does not spoil effect of mining. We take an algorithm called Iterated Contextual Distances (ICD) [1], show its problems for practical applications, and propose a feature selection method, which mitigates these problems. Then we show effects of the feature selection by experiments performed on a real dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

Context Feature Selection for Distributional Similarity

Distributional similarity is a widely used concept to capture the semantic relatedness of words in various NLP tasks. However, accurate similarity calculation requires a large number of contexts, which leads to impractically high computational complexity. To alleviate the problem, we have investigated the effectiveness of automatic context selection by applying feature selection methods explore...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

A Geometric View of Similarity Measures in Data Mining

The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consid...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Feature Selection for Effective Calculation of a Similarity Measure

نویسندگان

چکیده

منابع مشابه

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Context Feature Selection for Distributional Similarity

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

A Geometric View of Similarity Measures in Data Mining

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

عنوان ژورنال:

اشتراک گذاری